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I.  Introduction 

The  Framinghc  .  Heart  Study  (Gordon  &  Kannel,  1968  ;  Truett, 
Cornfield  &  Kannel,  1967)  is  an  on-going  prospective  study  of  the 
development  of  cardiovascular  disease.  This  study  has  been  the 
basis  for  a  considerable  amount  of  epidemioloqic  research,  much 
of  it  through  the  use  of  logistic  reqresson.  For  example,  there 
has  been  considerable  emphasis  on  analyzing  the  probability  of 
developing  coronary  heart  disease  ( CHD ) .  In  this  instance,  the 
response  is  binary: 

Y  =  1  means  persons  develops  CHD  (1.1) 

=  0  means  person  does  not  develop  CHD. 

Many  of  the  analyses  have  attempted  to  relate  baseline  risk 
factors  to  the  probability  of  developing  CHD;  these  risk  factors 
include  systolic  and  diastolic  blood  pressure,  serum  cholesterol, 
history  of  smoking,  etc.  Ordinarily,  at  some  point  in  the 
analysis,  multiple  logistic  regression  is  employed. 

It  is  well-known  that  many  of  the  baseline  risk  factors  are 
measured  with  error;  systolic  blood  pressure  is  a  good  example 
(Rosner  &  Polk,  1979).  One  of  us  was  asked  by  a  number  of 
investigators  and  at  least  one  referee  whether  such  measurement 
errors  could  substantially  effect  the  logistic  regression 
estimates  and,  if  so,  what  could  be  done  to  correct  for  the 
measurement  error.  The  present  study  is  an  outqrowth  of  these 
questions,  although  there  are  many  important  practical  facets  of 
the  problem  yet  to  be  investigated. 
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In  an  interesting  paper,  Michalek  and  Trioathi  (  1980) 
discuss  the  effect  of  measurement  error  on  ordinary  logistic 
regression;  see  also  Ahmed  and  I.achenbruch  (1978).  Michalek  and 
Tripathi  conclude  that  ordinary  logistic  regression  will  not  he 
too  badly  disturbed  by  measurement  error  as  long  as  such  error  is 
moderate.  We  feel  that  our  methods,  in  providing  alternatives  to 
ordinary  logistic  regression,  will  help  the  experimenter  to  get  a 
more  precise  understanding  of  the  effect  of  the  measurement 
errors,  especially  if  they  are  severe. 

Our  model  is  as  follows.  We  have  a  sample  of  N  persons  from 
a  particular  population,  e.g.,  males  aged  48-54.  The  ith  person 
in  the  sample  is  assumed  to  have  a  vector  of  baseline  risk 
factors  with  the  probability  of  developing  disease  ( CHD ) 

given  by 

?(Y.  =  1 !x.)  =  G(XjBo),  i  =  1,  ...,  N,  (1.2) 

where  G(*)  is  a  known  distribution  function  such  as 

G(a)  =  (1  +  exp(a)}  ^  (Logistic  Regression) 

0(a)  =  $(a),  (Probit  Regression), 

where  # ( • )  is  the  standard  normal  distribution  function.  We 
will  return  to  probit  regression  later,  but  it  is  important  to 
remember  that  probit  and  logistic  regression  usually  give  similar 
results  (Halperin,  Wu  E.  Gordon,  1979;  Gordon,  et.  al.,  1977). 
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We  will  partition  the  risk  factors  x^  into  components 
observed  without  and  with  error,  so  that 


xi  =  ( w'  z'.  ) 

-1  -1  -1 


-0  ^Cl-025* 


(1.3) 


In  (1.3),  { w ^ }  can  be  observed  at  nearly  exact  levels;  aqe  and 
sex  are  examples.  In  (1.3),  the  {z^}  are  measured  with 
nontrivial  error  and  cannot  be  observed;  rather  we  only  observe 


+  u . .  (1.4) 

To  begin  the  discussion  we  are  going  to  assume  that  the  {u^}  are 
independently  and  normally  distributed  with  mean  zero  and 
covariance  matrix  assumed  nonsimular. 

When  the  risk  factors  {z^}  observed  with  error  are  assumed 
to  be  constants,  the  model  is  usually  called  the  functional  model 
(Kendall  &  Stuart,  1979).  In  this  instance,  6  and  the  N  values 
{zjJ  are  unknown  parameters,  and  the  number  of  these  unknown 
parameters  increases  with  the  sample  size  N,  so  that  classical 
maximum  likelihood  theory  does  not  applv.  In  fact,  in  the  next 
section  we  show  that  in  a  very  simple  logistic  regression  model, 
the  functional  maximum  likelihood  estimate  (MLR)  of  6  is  not 
consistent  when  is  known.  This  is  in  contrast  to  the 
functional  MLR  for  linear  regression,  which  is  generally 
consistent. 
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In  Section  3  we  study  the  more  tractable  structural  model, 
wherein  the  {z^}  are  themselves  independent  with  common 
distribution  function  F,  which  we  will  also  initially  suppose  is 

that  of  a  normal  random  vector  with  mean  u  and  covariance  I  . 

-z  z 

In  effect,  we  study  a  conditional  likelihood,  replacing  (1.2)  by 

P(*i  = 

In  Section  4,  the  non-normal  case  is  discussed.  In  Section 
5,  we  present  a  small  Monte-Carlo  study.  In  Section  6,  we 
analyze  the  effect  of  measurement  error  on  oredictinq  the 
probability  of  CHD  on  the  basis  of  systolic  blood  pressure. 

2.  The  Functional  Case 

'Consider  loqistic  regression  through  the  origin, 

P(Yi  =  lici)  =  {1  +  exp(n0c.)}”1,  (2.1) 

where  and  { c ^ }  are  scalars.  Because  of  measurement  error,  we 

observe 


C. 

l 


Ci  +  vi' 


(2.2) 


where  the  errors  {v^}  are  normally  distributed  with  mean  zero 
2  2 

and  variance  (0  <  aM  <  °° )  .  For  purposes  of  this  example,  we 

2  . 

will  assume  is  known  to  the  investigator. 
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In  the  circumstance  that  the  measurement  error  variance  is 
known,  for  linear  regression  the  functional  errors-in-var iahles 
maximum  likelihood  estimate  of  <*  is  generally  consistent  and 
asymptotically  normally  distributed.  We  now  outline  why  this 
happy  circumstance  does  not  carry  over  to  logistic  regression. 

The  maximum  likelihood  estimator  (MLR)  of  for  the 

.  2 

functional  model  (2.1)  —  (2.2)  with  o>t  known  maximizes 

M 


N 

H Y. logG(ac.  )  +  ( 1-Y .  ) log {  1-G ( a c .  ) } ] 

i=l  1  1  1  1 

n  l  N  -> 

-  <2V  7  'W  ■ 

i  =  l 


(2.3) 


where 

G ( t )  =  { 1  +  exp(t) }_1 

is  the  logistic  distribution  function.  For  this  functional 
model,  the  parameters  are  (a^,  (c^)).  For  given  a,  the  estimates 

of  { c ^ }  satisfy 


Ci  (a  ) 


C. 

1 


(G{aci  («  )) 
i  =  1,  . . 


t 


(2.4) 


The  MLE  a ^  satisfies  (2.4)  and 

N 

"'l  ill  ;i(VlRboVVl  -  Yi'  ' 


If  the  MLE  exists  and  is  unique,  and  if 


(2.1) 


is  asymptotically  normally  distributed  with  mean  zero  and 


positive,  finite  asymptotic  variance,  then  one  can  prove  (see 
Appendix)  that 

,  N  *  .  p 

N"  l  Ci(an^r,l‘aoci(ao)l  “  Yi'  1  °*  (2-6) 

i  _  1 

In  (2.6),  c^(ag)  satisfies  (2.4).  It  turns  out  that  (2.6)  does 

not  hold  even  in  the  followinq  simple  case:  take  c.  =  ±  0.6  (+ 

2 

if  i  is  odd,  -  otherwise)  and  =  1  (we  use!  numerical 
interqration  to  check  this). 

The  preceding  arqunent  shows  that  even  in  the  simplest  of 

cases,  the  functional  loqistic  errors- in-var iables  MLF  will  not 

be  unique  and  asymptotically  normal  in  the  usual  /N  sense.  We 

believe  this  phenomenon  carries  over  to  other  forms  for  the 

distribution  function  such  as  Probit  repression.  Tn  fact,  for 

2 

the  model  (2.1)  -  (2.2)  with  o  known,  we  have  been  unable  to 
construct  any  consistent  and  asymptotically  normal  estimate  of 


3.  Structural  Case:  Normal  bistribution 
The  model  is  qiven  by  (1.2)  -  (1.4),  but  in  the 
structural  case  we  eliminate  the  nuisance  parameters  { z ^ }  by 
assuminq  they  are  independent  anti  normally  distributed  with  mean 
vector  u  and  covariance  matrix  I  .  The  error  vectors  iu.}  cite 

—  Z  7,  —  1 


q 


also  assumed  to  be  independent  (of  0m5  another  and  of 
lz.})  normal  random  vectors  with  mean  n  and  covariance  Eu.  For 
the  moment  we  shall  assume  that  u  ,  7.  and  are  known;  we 
discuss  more  realistic  cases  near  the  end  of  the  section.  For  a 
qiven  qeneral  distribution  function  G  in  (1.2),  we  denote  the 
marqinal  likelihood  of  the  observed  data  by 


'V 


Defininq  the  dimension  of  Bq  to  be  p,  this  marqinal  likelihood, 
which  can  more  intuitively  be  written  as  the  product  of  the 
conditional  likelihoods  for  Y^  qiven  Z^,  is  orooortional  to 


L<G'!Loi'-02'5:M'^z'r'z) 

=  n  S.  Yi  S.  (1~Yi} 
i  =  l 


(3.1) 


where 


Si+  =  Al-^G(-i-01+  z'S02)exp{-0.  5(-i“  -^^M1  (-i" 

x  exp{ -  0.5(z  -  vz)'  -  Hz)}f3zf 


Ax  =  (2ir  )“P(  Um  II  Szl  ) 


-  V2 


(3.2) 


(3.3) 


and  S._  is  defined  by  replacinq  (1  ( •  )  by  1  -  G(*)  in  (3.2). 


Tietailed  calculations  show  that 


r  (V  )dv, 


(3.4) 


S.  =  A  .  t  .  ,  S.  =  A . .  ( 1  -  t  .  )  , 
1+  3 1  i+  l-  3 1  l  + 

ti+  =  /G(w'iB01+  (S'2A2  e02)  ^  v  + 


where 


y(v)  =  (  2n  )  ~L'-  exo  (  -0 . 5  v"1 

*2  -  (v1  *  v'r1 

d. .  =  I  "1  z.  +  z  _1  u 

-ll  M  -1  z  -z 


d0.  =  p'  E_1  P  +  Zl  Z. 

2i  -z  z  -z  -l  M  -l 

A ,  .  =  A,  |An  I  X/2  (2n  )P'/2oxp(-0.r>d,.  + 


3  i 


1  2 


In  effect,  the  calculation  of  the  likelihood  depends  only  on 
being  able  to  evaluate  (3.4).  This  is  no  easy  matter  in  general 
for  the  logistic  function  G(t)  =  {1  +  exp(t)}-  ,  although  if  the 
number  of  variables  measured  with  error  is  small,  (3.4)  can  in 
principle  be  evaluated  by  numerical  integration.  For  orobit 
regression,  (3.4)  can  be  evaluated  exnlicitly;  in  fact. 


fcit  =  ^(-i-01  +  — 1 i A 2  -02)(1 


+  -02A2  -02  ^ 


“  ^  }  . 


(3.5) 


Since  logistic  and  probit  regression  generally  give  similar 
estimates  of  event  probabilities  (Halnerin,  Wu  &  Gordon,  1979), 
in  the  rest  of  the  paper  we  confine  our  discussion  to  probit 
regress  ion . 

In  most  instances,  the  nuisance  parameters 

p  ,  I  and  will  be  unknown.  Joint  estimation  of  these 

-z'  z  M 


parameters  and  through  the  likelihood  (1.1)  nay  he 

computationally  feasible  by  such  devices  as  the  K-'-1  algorithm 
(Dempster,  Laird  s.  Rubin,  1977),  although  this  remains  to  be 
explored.  A  simpler  and  reasonable  al'ernative  is  through  the 
method  of  pseudo  maximum  likeliho*]  ms  tin  at ion  ( P^LE  -  see  Cong  u 
Samaniego,  1981).  Computing  PMLK's  for  simply  consists  of 

finding  estimates  of  u  ,  E  and  E.^  and  olujging  these  estimates 
into  (3.1).  One  obvious  estimate  For  u  is 

u,  =  N'1  T  Z. ,  (3.6) 

“z  i=l  -i 

while  an  estimate  for  ^  is 

/\ 

{lz  +  V  =  N  (-i"  !!  2 }  ( —  i  -  )Lzy  •  (3.1) 

One  common  wav  to  estimate  ? is  bv  replication.  For  example, 

M 

suppose  that  each  variable  subject  to  error  but  with  unknown 
covariance  is  measured  twice.  Call  these  replicates 
Z  r  Tnen'  in  terms  of  the  earlier  notation, 

h  -  <hi  +  7-i2)/2-  <3-f’ 

Since  {Z.}  have  common  covariance  E ,  we  can  comoute  the 
-  r  M 

estimates 

E^  =  sample  covariance  of  {(z.^  ”  z^2)/2}, 


(3.9) 
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=  (EM  +  [z>  *  V  n‘10) 

"'he  substitutions  (3.6)-(3.10)  provide  an  easy  way  to  obtain 
consistent,  an!  asymptotical  ly  normal  estimates  of  0  ,  say  0  ^ . 
There  are  many  ways  to  estimate  the  covariance  •■Mtrix  of  0^ .  One 
method  is  the  bootstrap  (Ffron,  lQ^h;  1981);  this  cousin  to  the 
iackkni fo  merely  requires  having  enough  computer  time  to 
calculate  0q  for  suf  f  icie..  civ  many  randomly  drawn  (with 
replacement)  samples  of  size  N  from  the  original  data. 

M ternat i vely,  one  could  use  the  theory  of  PVbO's  given  by  Gong 
and  Gananiego  (1981)  (actually,  one  must  generalize  their 
equations  (2.8)  and  (2.6)  slightly).  The  difficulty  with  this 
approach  is  also  computational,  as  it  involves  taking  derivatives 
of  the  log  of  (3.1)  with  resnect  to  (  ?n  ■,  , ?  ^  2  /  ^  )  . 

4.  Structural  Case:  Non-Normal  Distributions 
In  the  previous  section  we  have  made  the  assumption  that 

both  the  measurement  errors  { u . )  and  the  structural  parameters 

- 1 

{z^}  are  normally  distributed.  One  mav  wish  to  take  a  more 
nonparanetric  view  and  not  assume  that  either  { u ^ }  or  { z ^ }  are 
normal  random  variables.  We  will  outline  a  method  for  this 
problem,  retricting  ourselves  to  the  following  situation: 

The  random  variable  z^  subiect  to  measurement  (4.1a) 

error  is  scalar. 


The  variable  subject  to  measurement  error  is 


(4.1b) 
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replicated  as  in  (3.3). 

Of  coarse,  if  the  common  density  h(z)  of  the  {z^}  were 
known,  conditional  likelihood  methods  could  he  used  as  in  the 
previous  Section.  However,  we  are  interested  in  situations  for 
which  h(z)  is  not  comnletely  known.  One  very  simole  device  is  to 
assume  that  h(z)  has  a  simple  two-term  Edgeworth  expansion,  e.q., 

—  1  '  0 

h(z)  =  (  2m  o  z  )  ^  exr>  [-0  .  S{  (  z.-j  )  /a  1  " !  (4.2) 

(1  -  c3(z1-3z)/6  +  c4(z4-  6z  ,:+  3 )  /  24}  , 

where  u  and  a  are  the  mean  and  variance  of  the  {z.}  and  c-, 
z  z  l  u 

and  c^  are  standard  measures  of  skewn^s ;  and  kurtosis.  because 
of  the  replication  assumed  in  (4.1h),  these  four  parameters  are 
easily  estimated,  givinq  us  a  sample  based  densitv  with  which  to 
work.  The  multivariate  case  can  also  he  handled,  see  Johnson  and 
kotz  (1972). 

Given  that  we  either  know  or  can  estimate  h(z),  the  method 
of  estimation  we  pronose  is  based  on  nonlinear  repression.  It 
has  the  appeal  inq  feature  that  we  do  not  need  to  know  the 
distribution  of  the  measurement  errors  {u^}  in  defining  the 
estimator.  In  particular,  we  will  turn  the  nroblen  around  and 
consider  the  distribution  of  z^  given  Y^  and  w^ 

{ recall ,  *  1  =  ( wl ,  z  .  ) } .  Let  h ( z | Y .  ,  w.  )  be  the  conditional 
density  of  z^  given  Y^  and  .  This  is  a  comolicated  but  easily 

computed  function  of  h(z),  Y^,  w^,  6^  and  define  the 

conditional  means  of  { z ^ }  by 


1  4 


r'-01'  3 02 '  -i}  =  P(zi  |Vi  =  v'  -i  )*  (4.3) 

In  analogy  with  nonlinear  re  irons  i  >n,  !  ->r  weights  wqt^  and  wqt9, 


we  propose  minimizing 


E  {  7 


i*  r ' -01 ,302' 1 '-i ) •  Yi 


+  L 


'i*  ~  r(-ni  )}  (  1  ~  Yi} 


(4.4) 


Actually  computing  the  estates  of  and  P  is  quite 

feasible  because  it  only  relies  on  nonlinear  regression. 
Inference  based  on  the  estimates  is  complex;  wo  have  no  simnle 
larqe  sample  theory  and  suggest  that  bootstrap  methodoloav  be 
used . 


5.  A  Monte-Carlo  Stuiv 

A  simulation  study  was  performed  for  the  nr obit  model. 
Specifically, 


P(Yi  =  llz.)  =  $(z.  -  1),  (i  =  1,  ...,  200) 


and  we  observe 


z  .  +  u  .  .  ,  a 
l  id 


1,  2. 


Here  {z^}  and  iu^..}  were  independent  normal  random  variables 

.  2  2 
with  mean  zero  and  variances  a  =  o„  =  0.2S.  Thus,  the 

z  M  ’ 

simulation  concerns  a  situation  in  which  the  measurement  error  is 


IS 


\ 


large,  as  is  the  samole  size  'I  -  200.  Ml  connotations  were 
performed  at  the  National  Institutes  ol  Health  Computing  Center 
using  the  SAS  statistical  package,  specifically  the  procedure 
NLIN.  The  experiments  were  replicated  100  times.  The  estimates 

of  u  ,  T.  and  were  obtained  as  described  by  (3.61  -  (3.10). 

z  z  M 

In  Table  1,  we  list  the  means  an!  mean  square  errors  for  the 
estimates  of  interceot  {=  -1.0)  and  slope  ( =  1.0)  obtained  by  the 
usual  orobit.  regression  (6^  ,  6^1  and  probit  errors-in- 
variables  (CIV)  regression  (t5^p»  fit?)*  This  table  is  a  classical 
expression  of  the  trade-off  between  bias  and  variance,  especially 
for  the  slopes.  The  usual  orobit  slopes  are  badly  biased  but  not 
particularly  variable.  The  probit  FTV  slones  are  relatively 
unbiased  but  quite  variable?  overall,  thov  result  in  an 
approximately  23%  qain  over  the  usual  orobit  regression  in  terms 
of  mean  square  error. 

Often  more  important  than  the  estimates  of  individaul 
parameters  is  the  hehavior  of  the  estimate!  risk  or  probability 
function  as  a  function  of  the  true  value  of  the  predictor: 


Probit : 

^op+  e 

ip  z> 

Probit  FIV: 

+ 

b1f:z) 

In  Fig.  1,  we  plot  the  averaqe  values  of  the  risk  or  probability 
as  a  function  of  z,  as  well  as  the  true  risk  ^unction;  these  were 
averages  over  the  100  simulations  for  different  values  of  z, 
smoothed  by  spline  interpolation.  Note  that  the  orobit  FIV  risk 


I 


function  is  approximately  unbiased  w.il^  the  usual  orohit  risk 
function  is  badly  biased  for  those  at  highest  risk. 

In  estimating  the  risk  function,  if  turns  out  that  there  is 
not  nearly  the  trade-off  between  bias  and  variance  as  there  is 
for  estimating  individual  parameters.  In  Piq.  2,  we  slot  the 
mean  square  error  functions  as  a  function  Z;  aqain,  mean  square 
errors  were  calculated  for  various  z  and  then  the  function  was 
interpolated  by  a  spline  available  in  the  PAS  procedure  GPLOT. 

For  the  high  risk  cases,  the  prohit  RIV  is  noticeably  better  than 
the  usual  probit  risk  function.  In  Fiq.  3,  the  ratios  of  mean 
square  errors  for  the  probit  versus  nrobit  RIV  risk  functions  are 
plotted . 

We  also  experimented  with  the  nonl inear  least  squares 
methodoloqv  of  Section  4.  We  followed  the  suuqestions  of  Section 
4  with  the  exception  that  we  assume  normality.  The  resulting 
estimates  had  almost  the  same  mean  si. lire  error  properties  as  the 
probit  RIV  estimators,  a  fact  which  wo  found  both  surprising  and 
encouraging . 


6.  An  Fxamole 

To  get  some  idea  of  the  possible  effects  of  measurement 
error  in  a  more  realistic  context,  we  considered  some  of  the  data 
from  the  Framingham  Heart  Study  (Gordon  &  Fannel,  196S).  The 
Framingham  Study  has  followed  a  sample  of  the  male  and  fenalo 


population  of  Framingham  (Massachusetts )  biennially  since  around 
1950  in  order  to  study  the  development  of  cardiovascular 
disease.  For  purooses  of  this  paper,  data  used  here  were  on  men 
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aqed  45-54,  systolic  blood  oressures  boinq  tak-'o  it  exai  four. 
Individuals  were  called  diseased  cases  if  t’ley  developed  coronary 
heart  disease  within  the  six  year  interval  after  exam  four. 

There  were  513  cases,  of  whom  ffi  were  eventually  considered  to  be 
diseased  cases. 

For  the  averaqe  of  the  two  systolic  blood  pressures,  we 
estimated 

o2  =  1.14 
z 

o2  =  0.10  =  0.00  o2 

M  z 

Hence,  the  apparent  measurement  error  was  suite  small,  with  the 
usual  probit  and  probit  FIV  estimates  of  slooe,  intercept  and 
risk  beinq  only  minimallv  different.  At  this  noint,  we  realized 
that  we  were  .innorinq  other  sources  of  variation  which  miqhf  be 
more  appropriately  classified  as  "measu rem<vit  error.” 
Specifically,  one  miqht  think  of  the  variance  of  systolic  blood 
pressure  as 


o 


2 

s 


+ 


+ 


o 


2 

MF ' 


where 


2 

=  population  variance  of  the  "true"  svstolic 

blood  pressures  calculated  at  a  fixed  time,  say 


0:00  am, 


o  exam  tine  of  dav  effect;  within  i  nd  i  vi  du  a  1  s 

there  is  a  diurnal  effect  for  blood  pressure, 
see  Comstock  (  L957)  and  Could,  et  al.  (1931). 
ether  effects  may  also  be  noted,  e.q.,  those 
which  could  be  attributed  to  nurse  or  physician 
readinq  the  blood  pressure  or  to  the  subject’s 
physical  or  psycho! on i cal  disposition. 

2 

°ME  =  "mechanical"  measurement  error  as  seen  by 
differences  in  two  readinqs. 

In  the  analysis  based  on  (3.f>)  -  (3.10),  we  had 


'2.2  2 
a  =o  +  a 
z  s  it 


12  i  n  2 

M  MR 


when  we  actually  should  have  had 
'2  .  2 


o  =  a 


z  s 


0  M  =  °T  + 


2 

°  ME  * 


2 

We  have  no  estimate  of  o  for  the  Framinnham  males  aqeb  45-54,  so 
we  decided  upon  the  followinq  device.  Let  0< PVAR< 1  and  define 


o  2 ( new )  =  PVAR  a2  +  (l-PVAR)o2 
z  M  z 

o  ^ (new)  =  (l-PVAR)o^  +  PVAR  o  ?  . 
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Basically,  PVAR  is  something  like  th-s  nrmorLion  of  variance  ■'lee 
to  diurnal  or  other  unmeasured  effect;:. 

In  Fiq,  4,  we  plot  the  prohit  KTV  r  i  s’1'  functions  for  the 
cases  PVAR  =  0,  0.2,  0.4,  represent  inq  no,  moderate  and 
substantial  tine  of  dav  effects  respectively.  What  is  clear  from 
Fiq.  4  is  that,  if  there  is  a  larqe  time  of  day  effect,  our 
estimate  (PVAR  =  0.0)  of  the  relationship  of  risk  for  CHD  and 
"true"  systolic  blood  pressure  could  he  badly  biased  for  high 
risk  patients. 
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Appendix 


Proof  o f _ (2.6) 

Assume  as  in  Section  2  that 

N  ^2  (;o  -  «Q)  =  0p(l)  .  (Al) 

Assume  also  the  normalizing  conditions 
i  N 

N  £  c.  A  (A2) 

i  =  1  1 

^  N  o 

N-1  l  c.  *  B.  (A3) 

i  =  l  1 

Then  (A2 )  and  (A3)  imply 

max{cj\/N:  l<i<N}  +  0.  (A4) 

From  (2.4)  and  the  definition  (2.2),  it  follows  that 

lim  max  sup  lc . (a )-v. I /( 1+ l«0 I + Ic . I )  =  Op(l).  (A5) 

e>0  l<i<N  I  a  -a  0  |  <  e 

Further,  by  normality  of  {v^}, 

max {  I  I  /N  ^2  .  l<i<N)  ^  o,  ( A6 ) 

Lemma  Al  It  follows  that 

max  {|c^(aQ)  -  c^(aQ)l:  l<i<N)  5  o.  (A7) 


I 


23 


Proof  of  Lemma  A.l  Make  the  fie  fin  it  ions 


( u , a )  =  u  -  ci  -  vi  -  a { G{ au )  -  Y . } , 

so  that 

H. (c. (a),a}  =  0. 


The  partial  derivatives  of  are 

I’1Hi(U,a)  =  H.  (u,a)  =  1  +  a2G(au )  {  l-G(aU )}  , 

0 

(u'a  )  =  hL  (  u  ,  a  )  =  -  {C(au)  -  }  +  auG(au){l-G 

Ry  the  chain  rule, 

^•ci(a)  =  “ f  D1 Hi C ci ) 'a) 1  1 n2  Hi ^  c i (a  )  'n5  • 


(A4),  (A6),  ( A7 )  and  (A8),  it  follows  that  for 


—  l/~  ^  A 

N  ^  max  sup  1-  |  -- —  c  .  ( a  )  | 

l<i<N  |a-anl<M/N  '2  1 


=  0  {  max  sup  Y  lc.(a)!/N 

l<i<N  |a-aQ|<M/N  /2  1 


hi  r  n, 


This  means  that  for  every  M>0, 


max  sup  2,  |c . (a) 

l<i<N  |a-a0|<M/Ny2  1 


Ci (on) |  ?  o. 


(au)} . 


(A?,) 


everv  M>0 , 


(AB) 


which  by  (A3)  completes  the  proof  of  Lemma  Al. 
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This  crucial  substitution  is  true  in  fairl.v  qeneral 
circumstances .  It  too s  not  follow  from  ordinary  likelihood 
calculations  because,  in  the  functional  case,  the  number  of 
parameters  increases  with  the  samnle  size. 


'Table  1 


Usual  °rob i t 
Repress ion 
Intercept  Slone 

(=  -1.0)  (=1.0) 


Errors -in -Variables 
Probit  Regression 
Intercept  Slope 

(  =  -  1.0)  (=1.0) 


Mean 

-0 .963 

0.663 

-1.011 

1 . 070 

Mean  Square  Error 

0.0136 

0.142 

0.0158 

0. 110 

Minimum 

-1.246 

0.324 

-1.371 

0.454 

Maximum 

-0.626 

1.208 

-0.663 

2.368 

Interquartile 

0.148 

0.244 

0.174 

0.403 

Ranqe 
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Fiq. 


Average  risk  for  simulation  -lata. 


« 


AVERAGE  RISKS  FOR  SIMULATION  DATA 


PREDICTOR  VARIABLE  X  IS  NORMAL,  MEAN  ZERO.  VARIANCE  0.25 
MEASUREMENT  ERRORS  ARE  NORMAL,  MEAN  ZERO,  VARIANCE  0.25 
THE  SAMPLE  SIZE  IS  N-200 
THERE  WERE  100  MONTE-CARLO  REPLICATIONS 
■  TRUE  PROBIT  RISK  PARAMETERS  ARE  I NTERCEPT  =  - 1  ,  SL0PE=1 


AVERAGE  MSE  FOR  SIMULATION  DATA 


THE  PREDICTOR  VRRIRBLE  X  IS  NORMAL .  MERN  ZERO.  VRRIRNCE  0.25 
THE  MEASUREMENT  ERRORS  RRE  NORMRL .  MERN  ZERO,  VRRIRNCE  0.25 

THE  SAMPLE  SIZE  IS  N=200 
THERE  WERE  100  MONTE-CARLO  REPLICATIONS 
THE  TRUE  PROSIT  RISK  PARAMETERS  RRE  INTERCEPTS- 1 .  SLOPED 


THE  PREDICTOR  VfiRIPBLE  X  IS  NORMAL ,  MERN  ZERO,  VRRIRNCE  0.25 
THE  MEASUREMENT  ERRORS  RRE  NORMAL,  MERN  ZERO.  VARIANCE  0.25 

THE  SAMPLE  SIZE  IS  N=200 
THERE  WERE  100  MONTE-CARLO  REPLICATIONS 
THE  TRUE  PROBIT  RISK  PARAMETERS  ARE  INTERCEPT::-!  .  SL0PE=1 


THESE  ARE  PLOTS  OF  PR09IT  RISK  FUNCTIONS  ON  FRAMINGHAM  DATA. 
THE  CASE  PVAR  OF  0.00  IS  THE  ORDINARY  PROBIT  EIV  RISK  FUNCTION. 
PVAR  IS  IN  GENERAL  THE  MIXING  PROPORTION  USED  IN  APPORTIONING 
THE  VARIANCES.  TO  TAKE  INTO  ACCOUNT  TIME  OF  DAY  VARIATION. 


